The present invention relates to written non-phonetic characters of oriental languages, such as Chinese, Japanese, Korean language, Indian language, and etc., and more particularly to a conversion method of creating new surrogate words to precisely represent such non-phonetic characters used in written oriental languages, in which the surrogate words are words created with either English-style or native alphabets in the present invention to represent non-phonetic characters used in the Chinese, Japanese and Korean languages. Therefore, the non-phonetic characters can be easily inputted into in a computer through an English-style or native alphabetic keyboard, a mouse or other phonetic inputting method. Moreover, such new surrogate words can be stored in a computer and precisely transmitted by E-mail (Electronic Mail).
Non-phonetic characters of Chinese languages were derived from pictures by the ancient Chinese to express themselves thousands of years ago. These characters have gone through many changes over their long history, from pictures of the subjects they described in the ancient times to the uniformly square shapes of today. The Koreans and the Japanese adopted and incorporated the Chinese characters into their languages, although they do not necessarily pronounce or use all the characters the same way as the Chinese do. The majority of the characters used by these two peoples have the same or similar meaning as they do in the Chinese language. Nowadays, most of the characters are consisted of two parts, i.e. one denotes the meaning, usually referred to as the pictogram when this part resembles something, or referred to as the ideogram when this part bears some of the meaning of the character. The other denotes the pronunciation, usually referred to as phonetic radical. In Chinese language, the pronunciation of a character is monosyllabic, meaning one sound for each character.
The ideogram is a symbol that can be either a character or part of a character, which denotes the meaning of that character by inference. Pare ideograms are rare. However they can be found in many characters that do not have phonetic radicals but instead, have two or more pictograms combined to infer a meaning that can be understood by the readers. The pronunciation of this kind of characters must be memorized, since there are no phonetic radicals present in this kind of characters. When the ideogram is used as a radical of a character, it is silent. The following are some examples of the ideogram.
(1) {character pullout} is made of the sun, {character pullout}, and the moon, {character pullout}, therefore it means bright. (2) {character pullout} is consisted of abundant, {character pullout} and color, {character pullout}, therefore it means strikingly beautiful. (3) {character pullout} is the combination of a son, {character pullout} and a daughter, {character pullout}, hence it means good. (4) {character pullout} is made of combination of silk, {character pullout} and small squares of rice field, {character pullout}, therefore it means tiny and fine. (5) {character pullout} is made of two trees, {character pullout}, therefore it means woods.
The pictogram is a symbol that is either a character or part of a character, which is the approximate likeness of an object the character described. The pictograms are more common than the ideograms since the Chinese characters evolved from pictures. When the pictogram is used as a part of a character, it is silent. For example, {character pullout} for bird, {character pullout} for horse, and {character pullout} for wood or tree.
A pictogram not only bears the meaning of the character of which it is a part but also expresses the meaning by showing the physical likeness of the object the character described. This affords the character to be easily recognized and understood.
The radical is a part of a character. There are usually two kinds of radicals in a character. One denotes the meaning and the other denotes the pronunciation of the character. The former is known as a pictogram or ideogram depending on its shape or what it stands for. If the shape resembles an object, it is called pictogram. If it does not resemble anything but has a meaning derived from other uses, or from inference, it is called ideogram. They remain silent when the character is pronounced. Another kind of radical is known as a phonetic radical that bears the actual or approximate pronunciation of the character, hence it is sounded.
Sometimes, a character can be used as a radical, such as (1) {character pullout} in {character pullout}. (2) {character pullout} in {character pullout}. This kind of radicals are mostly used as phonetic radicals. Very often, a radical can be used as a character, such as {character pullout}.
Another unique feature in colloquial Chinese language is that it allows four ways to pronounce a given phonetic, i.e. four intonations. The total combinations of pronunciations and intonations in Chinese language are about 1,544. This compares to about 13,200 commonly used characters. Theoretically speaking, each pronunciation/intonation combination represents about 8 to 9 characters. In reality, a lot of pronunciation/intonation combinations are not adequately used or not used at all. Furthermore, the Chinese people seem to over-use some of the combinations, such as ji, qi and xi. Such uneven usage causes certain combinations to represent more than 50 characters. The applicant calls this phenomenon over representation, a problem that renders oriental languages (including Chinese, Japanese, Korean, and Indian languages) very difficult to be computerized in their original forms.
For example, there are 99 Chinese characters, such as {character pullout}, z,23 , {character pullout}, {character pullout}, {character pullout}, etc., having the same pronunciation/intonation combination of ji. There are 69 Chinese characters, such as {character pullout} {character pullout} {character pullout} {character pullout} {character pullout} etc., having the same pronunciation/intonation combination of qi. There are 67 Chinese characters, such as {character pullout} {character pullout} {character pullout} etc., having the same pronunciation/intonation combination of xi.
Currently, the oriental languages, such as Chinese, Japanese, Korean language, and Indian language, use thousands of characters which is in contract to the English language's 26 alphabets, therefore the computerization of such oriental languages is a substantial problem. Obviously, it is absolutely impractical to have a typewriter keyboard consisting of thousands of keys. Thus, the input of the oriental characters into the computers or word processors becomes an extremely hard task.
Generally speaking, there are two major systems of computer inputting method for the oriental languages, i.e. the "shape" system and the "phonetic" system. The "shape" system, such as the "CHANGJEI" or "DA YI" input system for Chinese, designates a plurality of shape symbols according to the shapes of the radicals of the characters, in which each combination of the shape symbols represents an unique characters. The drawback of the "shape" system is really difficult to learn and use. The users have to study the specific way of how to divide each character into predetermined shape symbols and learn by heart thousands of shape symbols representing different characters. Although the shape system enables the user to precisely input the specific character into the computer or word processor, only a tiny portion of skilled people such as the professional typists who received special, and intensive training can utilize such "shape" system. Ordinary people are unable to input even one character by utilizing the "shape" system. Besides, the learning process of the "shape" system is so complicated that most business people are unable to spend so much time to learn by heart all the input codes of the "shape" system. In other words, the "shape" system is designed for those people whose career are computer data typists only. Furthermore, the "shape" system inputs and stores each character by 2 bites. However, during electronic transmission, such as E-mail, the transmitting unit is single byte only, so that any information or data inputted by the "shape" system is unable to E-mail through Internet. In other words, those oriental people whose written language is not a phonetic one such as English have little or no chance to enjoy the convenience of E-mail and Internet.
The "phonetic" system, such as the "PIN YIN" or "ZHUYIN" input system for Chinese, as shown in FIG. 1, enables the user to input the pronunciation of the character by typing the corresponding Latin-style alphabets adopted to represent consonants and vowels in Chinese or zhuyin zimu into the computer, therefore most people can utilize these methods without any training. Basically, pin yin is the Chinese pronunciation for spelling. Here the term implies "spelling with Latin-style alphabets". The pin yin system generally refers to the Draft Plan of Chinese Language Phonetic Spelling announced by the Committee for Chinese Characters Reform in February of 1956. These alphabets are listed alongside the zhuyin zimu in FIG. 1. The zhuyin zimu comprises thirty-six Chinese characters, with very few strokes each, which were chosen to represent consonants and vowels in Chinese language by the Chinese Ministry of Education in the spring of 1913. Zhuyin zimu is still commonly used in Taiwan for the purpose of teaching the pronunciation of the Chinese characters. In mainland China, however, the pin yin system has replaced zhuyin zimu. Please refer to FIGS. 1 and 4.
As mentioned above, for oriental languages, it is very common that a plurality of different characters have the same pronunciation. In other words, one single set of pin yin or zhuyin zimu codes may represent a plurality of different characters. Therefore, after a set of pin yin or zhuyin zimu codes is keyed-in, either the "PIN YIN" or the "ZHUYIN ZIMU" system will provide numerous characters for the user to select the exact character therefrom. For example, the pin yin of "{character pullout}" is "ji". Therefore, if the user would like to key-in a character "{character pullout}" which means "and", the user can key-in the alphabets j and i. However, there are approximately 99 Chinese characters, such as {character pullout} {character pullout} {character pullout} {character pullout} {character pullout} etc., sharing the identical pronunciation/intonation combination of ji. The user needs to further search for the precise character {character pullout} from the 99 homonymous characters appeared on the computer screen. Obviously, both the "PIN YIN" and "ZHUYIN" systems are too cumbersome and impractical. Moreover, for the reason mentioned above, neither the "PIN YIN" system nor the "ZHUYIN" system is adapted for transmission through E-mail.
Nowadays, the most efficient and commonly used tool for information transmission is the E-mail. Large quantity of data and information can be transmitted all over the world instantly. People can transmit or achieve unlimited information and knowledge instantly through the E-mail. However, those countries and people using oriental languages still can not discover any input method which is adapted to be E-mail. This unsolved situation may deeply resist the cultural and commercial development between the Asia and the Western societies.
Moreover, having been around for thousands of years, the Chinese culture produced huge numbers of idioms and proverbs that are quoted daily by hundreds of millions of people throughout East Asia and other places where Chinese, Japanese and Korean languages are taught or used. It will be a boon to the computer users who routinely process one of these languages to have a fast yet accurate means to input these frequently quoted phrases and sentences into the computers.
Those existing methods to input commonly used phrases and sentences written in Chinese or Kanji characters consists the following steps:
(1) Alphabetizing the Chinese or Kanji characters according to their pronunciation in the respective language, and PA1 (2) Typing the first alphabet of the spelling of each Chinese or Kanji character of the phrase to form an acronym on the keyboard of a computer which is equipped with a software that can interpret the acronyms and display the correspondent Chinese characters. PA1 (a) alphabetizing a pictographic/ideographic radical of each character according to its pronunciation in a respective language, with the resulting spelling then being used as a prefix for a newly created surrogate word; PA1 (b) alphabetizing the character according to its pronunciation in the respective language, with the resulting spelling then being used as a spelled suffix for the newly created surrogate word; and PA1 (c) combining the prefix and suffix together to form a surrogate word for the specific "character" used in the written form of the respective language.
This method works fine when the pronunciation of a phrase is unique, but in real life, there are large number of phrases, especially the ones containing less than four characters, having identical pronunciation (homonymous). To compensate this problem of large number of homonyms, the software engineers design their program to display the phrases in Chinese characters, at the bottom of the screen for the typist to select the one he or she desires. If the desired phrase or sentence is not there, the typist can hit the down arrow key to invoke the next phrase or sentence until the desired one is found. This searching or selection process makes the existing method cumbersome, time consuming and, sometimes frustrating.
Accordingly, the learning and memorizing of the Chinese or Kanji characters is so ineffective that even an average student at 4th grade in China or Taiwan cannot express himself or herself clearly in Chinese characters. This has been largely blamed on the `complexity` of the Chinese characters, and the public have been accepting this theory for over a thousand years. The situation can only be improved dramatically with a visual, audio or multimedia tool to expedite the learning process and consolidate what has already been learnt.